Spectral Graph Bipartitioning

Many data clustering problems can be interpreted as clustering of vertices of graphs. Graph bipartitioning problem is to partition vertices into subsets such that the connections within subsets are stronger than the connections between different subsets.

Partition of the vertices into two subsets is done according to signs of the eigenvector of the second smallest eigenvalue of the Laplacian matrix.

Prerequisites

The reader should be familiar with the basic graph theory, linear algebra and, in particular, eigenvalues and eigenvectors.

Competences

The reader should be able to apply graph spectral bipartitioning and recursive bipartitioning to data clustering problems.

Credits. The notebook is based on I. Mirošević, Spectral Graph Partitioning and Application to Knowledge Extraction.

Graphs

For more details, see W. H. Haemers, Matrices and Graphs and S. Butler and F. Chung, Spectral Graph Theory and the references therein.

Definitions

A weighted graph is an ordered triplet $G=(V,E,\omega)$, where $V=\{1,2,3,...,n\}$ is the set of vertices , $E=\{(i,j)\}$ is a set of edges connecting vertices, and $\omega$ is a set of weights of edges. We assume $G$ is undirected.

An adjacency matrix of graph $G$ is the matrix $A$ defined as $A_{ij}=\begin{cases} 1 \quad \textrm{if}\ (i,j)\in E, \\ 0\quad \textrm{otherwise} \end{cases}.$

A weight matrix of graph $G$ is the matrix $W$ defined as $W_{ij}=\begin{cases} \omega(e) \quad \textrm{if}\ e=(i,j)\in E, \\ 0\quad \textrm{otherwise} \end{cases}.$

A Laplacian matrix of graph $G$ is the matrix $L=D-W$, where $D=\mathop{\mathrm{diag}}(d_1,d_2,\ldots,d_n)$ with $d_i=\sum_{k=1}^n W_{ik}$ for $i=1,\ldots,n$.

A normalized Laplacian matrix is the matrix $L_n=D^{-1/2} L D^{-1/2}\equiv D^{-1/2} (D-W) D^{-1/2}$ (the scaled matrix of $L$).

An incidence matrix of graph $G$ is the $|V|\times |E|$ matrix $I_G$. Each row of $I_G$ corresponds to a vertex of $G$ and each column corresponds to an edge of $G$. In the column corresponding to en edge $e=(i,j)$, all elements are zero except the ones in the $i$-th and $j$-th row, which are equal to $\sqrt{\omega(e)}$ and $-\sqrt{\omega(e)}$, respectively.

Examples

Graph types and algorithms are implemented in the package LightGraphs.jl.

Plotting graphs is done by the packages GraphPlot.jl.

As a minor inconvenience, we can only plot unweighted graphs and plot weights as edge labels.


In [1]:
using LightGraphs
using GraphPlot

In [2]:
# varinfo(LightGraphs)

In [3]:
# Sources, targets and weights
n=7
sn=[1,1,1,2,2,3,3,3,5,5,6]
tn=[2,3,4,4,5,4,6,7,6,7,7]
wn=[2,3,4,7,1,3,2,1,7,3,5]
[sn tn wn]


Out[3]:
11×3 Array{Int64,2}:
 1  2  2
 1  3  3
 1  4  4
 2  4  7
 2  5  1
 3  4  3
 3  6  2
 3  7  1
 5  6  7
 5  7  3
 6  7  5

In [4]:
# Create the graph
G=Graph(n)
for i=1:length(sn)
    add_edge!(G,sn[i],tn[i])
end
G


Out[4]:
{7, 11} undirected simple Int64 graph

In [5]:
# What is the optimal bipartition?
gplot(G, nodelabel=1:n, edgelabel=wn)


Out[5]:
2 3 4 7 1 3 2 1 7 3 5 1 2 3 4 5 6 7

In [6]:
# Now some functions
using SparseArrays
using LinearAlgebra
function my_weight_matrix(src::Array,dst::Array,weights::Array)
    n=nv(G)
    sparse([src;dst],[dst;src],[weights;weights],n,n)
end

my_laplacian(W::AbstractMatrix)=spdiagm(0=>vec(sum(W,dims=2)))-W

function my_normalized_laplacian(L::AbstractMatrix)
    D=1.0./sqrt.(diag(L))
    n=length(D)
    [L[i,j]*(D[i]*D[j]) for i=1:n, j=1:n]
end


Out[6]:
my_normalized_laplacian (generic function with 1 method)

In [7]:
W=my_weight_matrix(sn,tn,wn)


Out[7]:
7×7 SparseMatrixCSC{Int64,Int64} with 22 stored entries:
  [2, 1]  =  2
  [3, 1]  =  3
  [4, 1]  =  4
  [1, 2]  =  2
  [4, 2]  =  7
  [5, 2]  =  1
  [1, 3]  =  3
  [4, 3]  =  3
  [6, 3]  =  2
  [7, 3]  =  1
  [1, 4]  =  4
  [2, 4]  =  7
  [3, 4]  =  3
  [2, 5]  =  1
  [6, 5]  =  7
  [7, 5]  =  3
  [3, 6]  =  2
  [5, 6]  =  7
  [7, 6]  =  5
  [3, 7]  =  1
  [5, 7]  =  3
  [6, 7]  =  5

In [8]:
Matrix(W)


Out[8]:
7×7 Array{Int64,2}:
 0  2  3  4  0  0  0
 2  0  0  7  1  0  0
 3  0  0  3  0  2  1
 4  7  3  0  0  0  0
 0  1  0  0  0  7  3
 0  0  2  0  7  0  5
 0  0  1  0  3  5  0

In [9]:
L=my_laplacian(W)
Matrix(L)


Out[9]:
7×7 Array{Int64,2}:
  9  -2  -3  -4   0   0   0
 -2  10   0  -7  -1   0   0
 -3   0   9  -3   0  -2  -1
 -4  -7  -3  14   0   0   0
  0  -1   0   0  11  -7  -3
  0   0  -2   0  -7  14  -5
  0   0  -1   0  -3  -5   9

In [10]:
Ln=my_normalized_laplacian(L)


Out[10]:
7×7 Array{Float64,2}:
  1.0       -0.210819   -0.333333  …   0.0         0.0        0.0
 -0.210819   1.0         0.0          -0.0953463   0.0        0.0
 -0.333333   0.0         1.0           0.0        -0.178174  -0.111111
 -0.356348  -0.591608   -0.267261      0.0         0.0        0.0
  0.0       -0.0953463   0.0           1.0        -0.564076  -0.301511
  0.0        0.0        -0.178174  …  -0.564076    1.0       -0.445435
  0.0        0.0        -0.111111     -0.301511   -0.445435   1.0

In [11]:
issymmetric(Ln)


Out[11]:
true

In [12]:
# Let us compute the incidence matrix
function my_incidence_matrix(G::Graph, weights::Array)
    A=zeros(nv(G),ne(G))
    k=1
    for a in edges(G)
        A[a.dst,k]=sqrt.(weights[k])
        A[a.src,k]=-sqrt(weights[k])
        k+=1
    end
    A
end


Out[12]:
my_incidence_matrix (generic function with 1 method)

In [13]:
Iᵧ=my_incidence_matrix(G,wn)


Out[13]:
7×11 Array{Float64,2}:
 -1.41421  -1.73205  -2.0   0.0      …   0.0   0.0       0.0       0.0
  1.41421   0.0       0.0  -2.64575      0.0   0.0       0.0       0.0
  0.0       1.73205   0.0   0.0         -1.0   0.0       0.0       0.0
  0.0       0.0       2.0   2.64575      0.0   0.0       0.0       0.0
  0.0       0.0       0.0   0.0          0.0  -2.64575  -1.73205   0.0
  0.0       0.0       0.0   0.0      …   0.0   2.64575   0.0      -2.23607
  0.0       0.0       0.0   0.0          1.0   0.0       1.73205   2.23607

Facts

  1. $L=I_{G}I_{G}^{T}$.

  2. $L$ is symmetric PSD matrix.

  3. $L\mathbf{1}=0$ for $\mathbf{1}=[1,...,1]^{T}$, thus $0$ is an eigenvalue of $L$ and $\mathbf{1}$ is the corresponding eigenvector.

  4. If $G$ has $c$ connected components, then $L$ has $c$ eigenvalues equal to $0$.

  5. For every $x\in \mathbb{R}^{n}$, it holds $x^{T}L x=\sum\limits_{i<j}W_{ij}(x_{i}-x_{j})^{2}$.

  6. For every $x\in\mathbb{R}^{n}$ and $\alpha,\beta\in\mathbb{R}$, it holds $(\alpha x+\beta \mathbf{1})^{T} L (\alpha x+\beta \mathbf{1}) =\alpha^{2} x^{T}L x$.

  7. Assume that the eigenvalues of $L$ are increasingly ordered. Then, $$ 0=\lambda_1(L)\leq \lambda_2(L)\leq \cdots \leq\lambda_{n}(L)\leq 2\max\limits_{i=1,\cdots ,n}d_{i}. $$

  8. $\sigma(L_n) \subseteq [0,2]$.

Examples


In [14]:
# Fact 1
norm(L-Iᵧ*Iᵧ')


Out[14]:
3.607797795906347e-15

In [15]:
# Facts 2 and 7
using Arpack
issymmetric(L), eigs(L)[1], 2*maximum(diag(L))


Out[15]:
(true, [20.31827114106893, 20.051835987346976, 12.69129560343135, 12.488755596470586, 8.50014109186039, 1.9497005798217644], 28)

In [16]:
# Fact 3
L*ones(n)


Out[16]:
7-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [17]:
# Fact 5
x=rand(n)
x'*L*x, sum([W[i,j]*(x[i]-x[j])^2 for i=1:n, j=1:n])/2


Out[17]:
(1.7142648263034737, 1.714264826303474)

In [18]:
# Fact 6
α,β=rand(),rand()
(α*x+β*ones(n))'*L*(α*x+β*ones(n)), α^2*x'*L*x


Out[18]:
(0.1861415183067141, 0.18614151830671471)

In [19]:
# Fact 8
eigvals(Ln)


Out[19]:
7-element Array{Float64,1}:
 2.185751579730777e-16
 0.17558968364154215
 0.887695270067757
 1.2900045462017005
 1.3691958920332263
 1.6212328903561466
 1.6562817176996276

Bipartitioning

Definitions

Let $\pi=\{V_{1},V_{2}\}$ be a partition of $V$ with $V_1,V_2\neq \emptyset$, $V_1\cup V_2=V$, and $V_1\cap V_2=\emptyset$.

A cut of a partition $\pi$ is the sum of weights of all edges between $V_1$ and $V_2$,

$$\mathop{\mathrm{cut}}(\pi)\equiv \mathop{\mathrm{cut}}(V_1,V_2)=\sum\limits_{{\displaystyle i\in V_{1} \atop \displaystyle j\in V_{2}}}W_{ij}.$$

A weight of a vertex $i\in V$ is the sum of the weights of all edges emanating from $i$, $\omega(i)=\sum\limits_{j=1}^{n}W_{ij}$.

A weight of a subset $\bar V\subset V$ is the sum of the weights of all vertices in $\bar V$, $\omega(\bar V)=\sum\limits_{\displaystyle i\in\bar V} \omega(i)$.

A proportional cut of a partition $\pi$ is

$$ \mathop{\mathrm{pcut}}(\pi)=\displaystyle\frac{\mathop{\mathrm{cut}}(\pi)}{|V_{1}|}+\frac{\mathop{\mathrm{cut}}(\pi)}{|V_{2}|}. $$

A normalized cut of a partition $\pi$ is

$$ \mathop{\mathrm{ ncut}}(\pi)=\displaystyle\frac{\mathop{\mathrm{cut}}(\pi)}{\omega(V_{1})}+\frac{\mathop{\mathrm{cut}}(\pi)}{\omega(V_{2})}. $$

Example

Consider the following partitions (all edges have unit weights):

The left partition $\pi$ v.s. the right partition $\pi^{\prime}$:

$\mathop{\mathrm{cut}}(\pi)=2$ v.s. $\mathop{\mathrm{cut}}(\pi^{\prime})=3$

$\mathop{\mathrm{pcut}}(\pi)=\frac{2}{1}+\frac{2}{11}=2.18$ v.s. $\mathop{\mathrm{pcut}}(\pi^{\prime})=\frac{3}{6}+\frac{3}{6}=1$

$\mathop{\mathrm{ncut}}(\pi)=\frac{2}{2}+\frac{2}{50}=1.04$ v.s. $\mathop{\mathrm{ncut}}(\pi^{\prime})=\frac{3}{27}+\frac{3}{25}=0.23$

Facts

  1. The informal description of the bipartitioning problem can be formulated as two problems, $$ \mathop{\textrm{arg min}}\limits_{\pi} \mathop{\mathrm{pcut}}(\pi) \quad \textrm{or} \quad \mathop{\textrm{arg min}}\limits_{\pi} \mathop{\mathrm{ncut}}(\pi). $$ The first problem favors partitions into subsets with similar numbers of vertices, while the second problem favors partitions into subsets with similar weights.

  2. Both problems are NP-hard.

  3. Approximate solutions can be computed by suitable relaxations in $O(n^2)$ operations.

  4. The partition $\pi$ is defined by the vector $y$ such that $$ y_{i}= \begin{cases} \frac{1}{2} & \text{for } i\in V_1 \\ -\frac{1}{2} & \text{for } i\in V_2 \end{cases} $$ The proportional cut problem can be formulated as the discrete proportional cut problem $$ \underset{\displaystyle \big|\mathbf{y}^{T}\mathbf{1} \big|\leq \beta} {\min\limits_{\displaystyle y_{i}\in \{-1/2,1/2\}}} \frac{1}{2}\sum_{i,j}(y_{i}-y_{j})^{2}W_{ij}. $$ Parameter $\beta$ controls the number of vertices in each subset.

  5. The normalized cut problem can be formulated as the discrete normalized cut problem $$ \underset{\displaystyle \big|y^{T}D\mathbf{1} \big|\leq \beta} {\min\limits_{\displaystyle y_{i}\in \{-1/2,1/2\}}} \frac{1}{2}\sum_{i,j}(y_{i}-y_{j})^{2}W_{ij}. $$ Parameter $\beta$ controls the weights of each subset.

  6. Using the Fact 5 above, the discrete proportional cut problem can be formulated as the relaxed proportional cut problem $$ \underset{\displaystyle y^{T}y=1}{\underset{\displaystyle \big| y^{T}\mathbf{1} \big| \leq 2\frac{\beta}{\sqrt{n}}} {\min\limits_{\displaystyle y\in \mathbb{R}^{n}}}} y^{T}L y. $$ Similarly, the discrete normalized cut problem can be formulated as the relaxed normalized cut problem $$ \underset{\displaystyle y^{T}Dy=1}{\underset{\displaystyle \big| y^{T}D\mathbf{1}\big| \leq \displaystyle \frac{\beta}{\sqrt{\theta n}}}{\min\limits_{\displaystyle y\in \mathbb{R}^{n}}}}y^{T}L_n y. $$

  7. The Main Theorem. Let $A\in \mathbb{R}^{n\times n}$ be a symmetric matrix with eigenvalues $\lambda _{1}<\lambda _{2}<\lambda_{3}\leq \cdots \leq \lambda _{n}$ and let $v^{[1]},v^{[2]},\ldots,v^{[n]}$ be the corresponding eigenvectors. For the fixed $0\leq \alpha <1$, the solution of the problem $$ \underset{\displaystyle y^{T}y=1}{\underset{\displaystyle \left|y^{T}v^{[1]}\right|\leq \alpha} {\min\limits_{\displaystyle y\in \mathbb{R}^{n}}}} y^{T}Ay $$ is $y=\pm \alpha v^{[1]}\pm \sqrt{1-\alpha^{2}}v^{[2]}$. (For the proof see D. J. Higham and M. Kibble, A Unified View of Spectral Clustering.)

  8. For $0\leq \beta <\frac{n}{2}$, the solution of the relaxed proportional cut problem is $$ y=\pm \frac{2\beta}{n\sqrt{n}}\mathbf{1}\pm \sqrt{1-4\frac{\beta ^{2}}{n^{2}}}v^{[2]}, $$ where $v^{[2]}$ is an eigenvector corresponding to $\lambda_2(L)$. $v^{[2]}$ the Fiedler vector. Since the first summand carries no information, $V$ is partitioned according to the signs of the components of $v^{[2]}$: $$ V_{1}=\{i:v^{[2]}_i <0\}, \quad V_{2}=\{i:v^{[2]}_i \geq 0\}. $$ Notice that the value of $\beta$ is irrelevant for the solution.

  9. For $0\leq \beta <\sqrt{\theta n}\left\Vert D^{\frac{1}{2}}\mathbf{1} \right\Vert _{2}$, the solution of the relaxed normalized cut problem is $$ y=\pm \frac{\beta }{\sqrt{\theta n}\left\Vert D^{\frac{1}{2}} \mathbf{1}\right\Vert _{2}^{2}}\mathbf{1}\pm \sqrt{1-\frac{\beta ^{2}}{ \theta n\left\Vert D^{\frac{1}{2}}\mathbf{1}\right\Vert _{2}^{2}}}D^{-\frac{1 }{2}} v^{[2]}, $$ where $v^{[2]}$ is an eigenvector corresponding to $\lambda_2(L_n)$. $V$ is partitioned according to the signs of the components of $v^{[2]}$, as above.

  10. Neither of the relaxed algorithms is guaranteed to solve exactly the true (proportional / normalized) cut problem. However, the computed solutions are in the right direction. Whether to use proportional or normalized cut formulation, depends upon the specific problem.


In [20]:
Matrix(L)


Out[20]:
7×7 Array{Int64,2}:
  9  -2  -3  -4   0   0   0
 -2  10   0  -7  -1   0   0
 -3   0   9  -3   0  -2  -1
 -4  -7  -3  14   0   0   0
  0  -1   0   0  11  -7  -3
  0   0  -2   0  -7  14  -5
  0   0  -1   0  -3  -5   9

In [21]:
# Voila!
λ,v=eigs(L,nev=2,which=:SM, v0=ones(n))


Out[21]:
([2.5376526277146454e-16, 1.9497005798217653], [-0.3779644730092273 -0.3832592806746632; -0.3779644730092273 -0.37308387896785944; … ; -0.3779644730092271 0.40850575685630564; -0.3779644730092272 0.44940824470088897], 2, 1, 7, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])

In [22]:
v


Out[22]:
7×2 Array{Float64,2}:
 -0.377964  -0.383259
 -0.377964  -0.373084
 -0.377964  -0.145189
 -0.377964  -0.380089
 -0.377964   0.423708
 -0.377964   0.408506
 -0.377964   0.449408

In [23]:
λ,v=eigs(Ln,nev=2,which=:SM, v0=ones(n))


Out[23]:
([2.6294755846385285e-17, 0.17584633065651978], [-0.3441236008058427 0.3380254462716218; -0.3627381250550059 0.3610656555558019; … ; -0.429197537639476 -0.4570992807918485; -0.3441236008058426 -0.38825182355842464], 2, 1, 7, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])

In [24]:
v


Out[24]:
7×2 Array{Float64,2}:
 -0.344124   0.338025
 -0.362738   0.361066
 -0.344124   0.126062
 -0.429198   0.45732
 -0.380443  -0.413108
 -0.429198  -0.457099
 -0.344124  -0.388252

Example - Concentric circles

A complete graph has edges connecting each pair of vertices.

To a set of points $X=\{x_{1},x_{2},\cdots ,x_{m}\}$ , where $x_{i}\in\mathbb{R}^{n}$, we assign a weighted complete graph $G=(V,E)$ with $m$ vertices, where the vertex $j\in V$ corresponds to the point $x_j\in X$.

The main idea is to assign weight of an edge $e=(i,j)$ which reflects the distance between $x_i$ and $x_j$, something like $\omega(e)=\displaystyle\frac{1}{\mathop{\mathrm{dist}}(x_i,x_j)}$.

However, this has to be implemented with care. For example, using simple Euclidean distance yield the same results as the function kmeans(). In this example we use Gaussian kernel, that is $$ \omega(e)=e^{\displaystyle -\|x_i-x_j\|_2^2/\sigma^2}, $$ where the choice of $\sigma$ is based on experience.

The computation of various distances is implemented in the package Distances.jl.

We will construct the Laplace matrix directly.


In [25]:
using Plots
using Distances

In [29]:
# Generate two concentric rings
k=2
import Random
Random.seed!(421)
# Center
center=[rand(-5:5);rand(-5:5)]
# Radii
radii=Random.randperm(10)[1:k]
# Number of points in circles
sizes=rand(1000:2000,k)
# radii=[1;10]
center,radii,sizes


Out[29]:
([-3, -3], [8, 5], [1109, 1478])

In [30]:
# Points
m=sum(sizes)
X=Array{Float64}(undef,2,m)
first=0
last=0
for j=1:k
    first=last+1
    last=last+sizes[j]
    # Random angles
    ϕ=2*π*rand(sizes[j])
    for i=first:last
        l=i-first+1
        X[:,i]=center+radii[j]*[cos(ϕ[l]);sin(ϕ[l])]+(rand(2).-0.5)/50
    end
end
scatter(X[1,:],X[2,:],aspect_ratio=:equal)


Out[30]:

In [31]:
# Weight matrix
W=1 ./pairwise(SqEuclidean(),X,dims=2)


Out[31]:
2587×2587 Array{Float64,2}:
 Inf           0.00698522   0.0039519   …   0.00687117   0.110614
  0.00698522  Inf           0.00715657      0.0383063    0.0101791
  0.0039519    0.00715657  Inf              0.0219643    0.00598839
  0.0179405    0.00414001   0.00566458      0.0059627    0.0227392
  0.00390593   0.00942131   0.224446        0.0338223    0.00592176
  0.024113     0.0224184    0.0043046   …   0.0116898    0.0287185
  0.00948816   0.00390503   0.00806857      0.00645292   0.0133272
  0.0193335    0.0286315    0.00446678      0.0129597    0.0242862
  0.00610924   0.568912     0.00845188      0.0534656    0.00901073
  0.00478183   0.00459447   0.0352373       0.0102413    0.00715309
  0.0250968    0.00433321   0.00513098  …   0.00591145   0.0293534
  0.00402184   0.0063856    0.891741        0.0182261    0.00608611
  0.01265      0.00397776   0.00661426      0.00613233   0.0170769
  ⋮                                     ⋱   ⋮           
  0.0225906    0.00623894   0.00841257  …   0.0100816    0.0452228
  0.0276853    0.00645205   0.00780388      0.0100002    0.0587574
  0.0479181    0.00721037   0.00681472      0.0101401    0.134106
  0.0163954    0.00599495   0.00993231      0.0104636    0.0307172
  0.00948233   0.107064     0.0112166       0.123623     0.0166397
  0.104761     0.0111673    0.00594337  …   0.012337     3.39775
  0.0244693    0.0319884    0.00670456      0.0232981    0.0505603
  0.00603174   0.0172592    0.0542486       0.159681     0.0102228
  0.00692692   0.00713047   0.0437036       0.0194618    0.0118203
  0.0437858    0.019157     0.00614175      0.0168739    0.116956
  0.00687117   0.0383063    0.0219643   …  Inf           0.0117438
  0.110614     0.0101791    0.00598839      0.0117438   Inf

In [32]:
# Laplacian matrix
for i=1:m
    W[i,i]=0
end
L=Diagonal(vec(sum(W,dims=2)))-W
# Check Fact 3
norm(L*ones(m))


Out[32]:
1.3077120790844576e-7

In [33]:
# Notice λ₁
λ,v=eigs(L,nev=2,which=:SM, v0=ones(m))


Out[33]:
([7.160820577440386e-11, 30.365789671697673], [0.019660827175972527 -0.03255526844539958; 0.01966082717593583 0.008755268312499493; … ; 0.01966082717574732 0.017703645546326842; 0.01966082717596733 -0.022086991274533254], 2, 2, 37, [2.55257434667484e-5, 0.0001654648947596821, 5.704761899245494e-5, -6.018821230703598e-6, 6.189360214732853e-5, 5.661296661405394e-5, 6.4121600278476e-5, 7.451138852017913e-5, 4.527918544332612e-5, 5.9988505274990084e-5  …  -4.351140960073655e-5, -2.415315081878191e-5, 4.0996119661116924e-5, -4.622174249565201e-6, -0.0001152668085136485, 6.812641957846505e-5, -9.767498014601361e-6, 8.451992776861339e-6, -5.979627840705711e-6, 8.481164963494607e-6])

In [34]:
# Define clusters
C=ones(Int64,m)
C[findall(v[:,2].>0)].=2


Out[34]:
1319-element view(::Array{Int64,1}, [2, 3, 5, 9, 10, 12, 14, 16, 20, 22  …  2562, 2564, 2567, 2571, 2572, 2574, 2580, 2583, 2584, 2586]) with eltype Int64:
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 ⋮
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2

In [35]:
# Yet another plotting function
function plotKpartresult(C::Vector,X::Array)
    k=maximum(C)
    # Clusters
    scatter(X[1,findall(C.==1)],X[2,findall(C.==1)])
    for j=2:k
        scatter!(X[1,findall(C.==j)],X[2,findall(C.==j)])
    end
    scatter!(aspect_ratio=:equal)
end


Out[35]:
plotKpartresult (generic function with 1 method)

In [36]:
plotKpartresult(C,X)


Out[36]:

This is the same partitioning as obtained by kmeans(). Let us try Gaussian kernel. A rule of thumb is: if rings are close, use $\sigma<1$, if rings are apart, use $\sigma>1$.


In [37]:
σ=1.2 # 0.1
W=exp.(-pairwise(SqEuclidean(),X,dims=2)/σ^2)-I
L=diagm(0=>vec(sum(W,dims=2)))-W
λ,v=eigs(L,nev=2,which=:SM, v0=ones(m))
C=ones(Int64,m)
C[findall(v[:,2].>0)].=2
plotKpartresult(C,X)


Out[37]:

Recursive bipartitioning

Definitions

Let $G=(V,E)$ be a weighted graph with weights $\omega$.

Let $\pi_k =\{V_{1},V_{2},...,V_{k}\}$ be a $k$-partition of $V$, with $V_i\neq \emptyset$ for $i=1,\ldots,k$.

The previous definition of $\mathop{\mathrm{cut}}(\pi)\equiv \mathop{\mathrm{cut}}(\pi_2)$ extends naturally to $k$-partition. A cut of a partition $\pi_k$ is

$$ \mathop{\mathrm{cut}}(\pi_k)=\sum\limits_{\displaystyle i<j} \mathop{\mathrm{cut}}(V_{i},V_{j}), $$

where $\mathop{\mathrm{cut}}(V_{i},V_{j})$ is interpreted as a cut of the bipartition of the subgraph of $G$ with vertices $V_1\cup V_2$.

A proportional cut of a partition $\pi_k$ is

$$ \mathop{\mathrm{pcut}}(\pi_k)=\underset{i<j}{\sum\limits_{i,j=1}^{k}} \left( \frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{|V_{i}|}+\frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{|V_{j}|}\right) = \sum_{i=1}^{k}\frac{\mathop{\mathrm{cut}}(V_{i},V\backslash V_{i})}{|V_{i}|}. $$

A normalized cut of a partition $\pi_k$ is

$$ \mathop{\mathrm{ncut}}(\pi_k)=\underset{i<j}{\sum\limits_{i,j=1}^{k}} \left( \frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{\omega(V_{i})}+\frac{\mathop{\mathrm{cut}}(V_{i},V_{j})}{\omega(V_{j})}\right) = \sum_{i=1}^{k}\frac{\mathop{\mathrm{cut}}(V_{i},V\backslash V_{i})}{ \omega(V_{i})}. $$

Fact

If we want to cluster vertices of graph $G=(V,E)$ into $k$ clusters, we can apply the following recursive algorithm:

  1. Initialization. Compute the bipartition $\pi=\{V_{1},V_{2}\}$ of $V$. Set the counter $c=2$.

  2. Recursion. While $c<k$ repeat:

    1. Compute the bipartition of each subset of $V$.

    2. Among all $(c+1)$-partitions, choose the one with the smallest $\mathop{\mathrm{pcut}}(\pi_{c+1})$ or $\mathop{\mathrm{ncut}}(\pi_{c+1})$, respectively.

    3. Set $c=c+1$.

  3. Stop.

There is no guarantee for optimality of this algorithm. Clearly, the optimal $k$-partiton may be a subpartition of one of the discarded partitions.